Search CORE

555 research outputs found

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: The ENCODE Project Consortium
Publication venue
Publication date: 01/01/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

Carolina Digital Repository

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Hardy-Weinberg Equilibrium Testing of Biological Ascertainment for Mendelian Randomization Studies

Author: Cupples
Davey-Smith
Gu
Guarnieri
Hardy
Hedrick
Hingorani
Ian N. M. Day
Kavvoura
Santiago Rodriguez
The ENCODE Project Consortium
The International HapMap Consortium
The Wellcome Trust Case Control Consortium
Tom R. Gaunt
Weinberg
Publication venue: Oxford University Press
Publication date: 15/02/2009
Field of study

Mendelian randomization (MR) permits causal inference between exposures and a disease. It can be compared with randomized controlled trials. Whereas in a randomized controlled trial the randomization occurs at entry into the trial, in MR the randomization occurs during gamete formation and conception. Several factors, including time since conception and sampling variation, are relevant to the interpretation of an MR test. Particularly important is consideration of the “missingness” of genotypes that can be originated by chance, genotyping errors, or clinical ascertainment. Testing for Hardy-Weinberg equilibrium (HWE) is a genetic approach that permits evaluation of missingness. In this paper, the authors demonstrate evidence of nonconformity with HWE in real data. They also perform simulations to characterize the sensitivity of HWE tests to missingness. Unresolved missingness could lead to a false rejection of causality in an MR investigation of trait-disease association. These results indicate that large-scale studies, very high quality genotyping data, and detailed knowledge of the life-course genetics of the alleles/genotypes studied will largely mitigate this risk. The authors also present a Web program (http://www.oege.org/software/hwe-mr-calc.shtml) for estimating possible missingness and an approach to evaluating missingness under different genetic models

Crossref

PubMed Central

Explore Bristol Research

Mapping the <i>Shh</i> long-range regulatory domain

Author: Amano
Belloni
Bickmore
Chuong
Davis
Dixon
Echelard
Epstein
Hecksher-Sorensen
Jeong
Jeong
Klopocki
Kokubu
Lettice
Lettice
Lettice
Lettice
Lettice
Lettice
Liu
Marinić
Mates
Montavon
Nagy
Niedermaier
Osoegawa
Paek
Riddle
Ruf
Sagai
Sagai
Sagai
Sharpe
Sharpe
Shen
Smallwood
Spitz
Sun
Symmons
Symmons
The ENCODE Consortium Project
Tsukiji
Publication venue: 'The Company of Biologists'
Publication date: 01/10/2014
Field of study

Coordinated gene expression controlled by long-distance enhancers is orchestrated by DNA regulatory sequences involving transcription factors and layers of control mechanisms. The Shh gene and well-established regulators are an example of genomic composition in which enhancers reside in a large desert extending into neighbouring genes to control the spatiotemporal pattern of expression. Exploiting the local hopping activity of the Sleeping Beauty transposon, the lacZ reporter gene was dispersed throughout the Shh region to systematically map the genomic features responsible for expression activity. We found that enhancer activities are retained inside a genomic region that corresponds to the topological associated domain (TAD) defined by Hi-C. This domain of approximately 900 kb is in an open conformation over its length and is generally susceptible to all Shh enhancers. Similar to the distal enhancers, an enhancer residing within the Shh second intron activates the reporter gene located at distances of hundreds of kilobases away, suggesting that both proximal and distal enhancers have the capacity to survey the Shh topological domain to recognise potential promoters. The widely expressed Rnf32 gene lying within the Shh domain evades enhancer activities by a process that may be common among other housekeeping genes that reside in large regulatory domains. Finally, the boundaries of the Shh TAD do not represent the absolute expression limits of enhancer activity, as expression activity is lost stepwise at a number of genomic positions at the verges of these domains

Crossref

PubMed Central

Edinburgh Research Explorer

SNPdetector: A Software Tool for Sensitive and Accurate SNP Detection

Author: David A Wheeler
Gabor Marth
Imtiaz Yakub
Jinghui Zhang
Kenneth H Buetow
Paul P Liu
Raman Sood
Richard A Gibbs
Sharon Wei
The Encode Consortium
The International HapMap Consortium
William Rowe
Publication venue: Public Library of Science
Publication date: 01/10/2005
Field of study

Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozgyosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

GIVE: portable genome browsers for personal websites.

Author: Alvin Zheng
B Sridhar
B Sridhar
C Tyner
D Barrios
D Comer
E Lieberman-Aiden
E Sharma
F Ozsolak
F Yue
FH Biase
JD Buenrostro
JG Aw
JT Robinson
LD Stein
ME Skinner
MJ Fullwood
Qiuyang Wu
R Bayer
R Li
R Mourad
S Carrere
Sheng Zhong
TC Nguyen
The ENCODE Project Consortium
VW Zhou
WJ Kent
X Li
X Zhou
Xiaoyi Cao
Z Lu
Zhangming Yan
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/

Crossref

Directory of Open Access Journals

eScholarship - University of California

The vertebrate genome annotation (Vega) database

Author: Allcock
Ashurst
Debenham
Gilbert
Hart
Horton
Hubbard
J. G. R. Gilbert
J. L. Harrow
K. Howe
Kasprzyk
L. G. Wilming
Prochazka
Renard
S. Trevanion
Sambrook
Sprague
T. Hubbard
The ENCODE Project Consortium
Traherne
Wicker
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega

CiteSeerX

Crossref

PubMed Central

King's Research Portal

A probabilistic generative model for GO enrichment analysis

Author: Alexa
Bader
Bar-Joseph
Cheung
Davis
Deutscher
Eisen
Ernst
Ernst
Ewing
Gasch
Gerard J. Nau
Giot
Grassme
Grossmann
Harbison
Ihmels
Itamar Simon
Jones
Kellis
Leem
Mewes
Mukherjee
Nasmyth
Natarajan
Nau
Navarre
Palomero
Park
Ren
Rojas
Roni Rosenfeld
Spellman
The ENCODE Project Consortium.
The Gene Ontology Consortium.
The Toxicogenomics Research Consortium.
Thomas
Yong Lu
Ziv Bar-Joseph
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods

Crossref

PubMed Central

Genome-wide associations of gene expression variation in humans

Author: Andrew G Clark
Barbara E Stranger
Brenda Kahl
David Allison
Emmanouil T Dermitzakis
ENCODE Project Consortium
Mark J Minichiello
Matthew S Forrest
Panagiotis Deloukas
Robert Lyle
Samuel Deutsch
Sarah Hunt
Simon Tavaré
Stylianos E Antonarakis
The International HapMap Consortium
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2005
Field of study

The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

FigShare